Web Information Modeling, Extraction and Presentation
نویسندگان
چکیده
WWW Information Collection, Collaging and Programming (Wiccap) system is a software system for the generation of logical views of web resources, and the extraction of desired information in the form of a structured document. It is designed to enable people to obtain information of interest in a simple and effective manner as well as to enable information from the WWW accessible to applications so as to afford automation, inter-operation and Web-awareness among services. In the Wiccap system, a data model has been proposed to model information sources from the logical point of view and a set of tools has been developed to automate the process of constructing logical models of websites, to extract information according to defined models, and to present information in an easily accessible manner.
منابع مشابه
Presenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملA Novel Method for Extracting Information from Web Pages with Multiple Presentation Templates
Web information extraction is the key part of web data integration. With the need of e-commerce website and the development of web design, web pages with multiple presentation templates arise. The current web information extraction systems are usually based on single presentation template, so web pages with multiple presentation templates can’t be extracted efficiently. This paper focuses on th...
متن کاملInformation Extraction from HTML Documents Based on Logical Document Structure
The World Wide Web presents the largest Internet source of information from a broad range of areas. The web documents are mostly written in the Hypertext Markup Language (HTML) that doesn’t contain any means for semantic description of the content and thus the contained information cannot be processed directly. Current approaches for the information extraction from HTML are mostly based on wrap...
متن کاملWeb Information Extraction and User Modeling: Towards Closing the Gap
Web search engines have become the primary method of accessing information on the web. Billions of queries are submitted to major web search engines, reflecting a wide range of information needs. While significant progress has been made on improving the relevance of the results, web search process often remains a frustrating experience. At the same time, web information extraction has seen trem...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملTypes and Roles of Ontologies in Web Information Extraction
We discuss the diverse types and roles of ontologies in web information extraction and illustrate them on a small study from the product offer domain. Attention is mainly paid to the impact of domain ontologies, presentation ontologies and terminological taxonomies.
متن کامل